Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
Genomics, Proteomics & Bioinformatics ; (4): 43-51, 2003.
Article in English | WPRIM | ID: wpr-339525

ABSTRACT

The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affect the accuracy of repeat assembly and scaffold construction, we also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.


Subject(s)
Animals , Humans , Cloning, Molecular , Genome , Genome, Human , Genomics , Methods , Models, Genetic , Models, Statistical , Models, Theoretical , Oryza , Genetics , Sequence Analysis, DNA
2.
Genomics, Proteomics & Bioinformatics ; (4): 101-107, 2003.
Article in English | WPRIM | ID: wpr-339517

ABSTRACT

We report a complete genomic sequence of rare isolates (minor genotype) of the SARS-CoV from SARS patients in Guangdong, China, where the first few cases emerged. The most striking discovery from the isolate is an extra 29-nucleotide sequence located at the nucleotide positions between 27,863 and 27,864 (referred to the complete sequence of BJ01) within an overlapped region composed of BGI-PUP5 (BGI-postulated uncharacterized protein 5) and BGI-PUP6 upstream of the N (nucleocapsid) protein. The discovery of this minor genotype, GD-Ins29, suggests a significant genetic event and differentiates it from the previously reported genotype, the dominant form among all sequenced SARS-CoV isolates. A 17-nt segment of this extra sequence is identical to a segment of the same size in two human mRNA sequences that may interfere with viral replication and transcription in the cytosol of the infected cells. It provides a new avenue for the exploration of the virus-host interaction in viral evolution, host pathogenesis, and vaccine development.


Subject(s)
Base Sequence , China , Cluster Analysis , Gene Components , Genetic Variation , Genome, Viral , Genotype , Molecular Sequence Data , Phylogeny , Reverse Transcriptase Polymerase Chain Reaction , Severe acute respiratory syndrome-related coronavirus , Genetics , Sequence Analysis, DNA , Severe Acute Respiratory Syndrome , Genetics
3.
Genomics, Proteomics & Bioinformatics ; (4): 108-117, 2003.
Article in English | WPRIM | ID: wpr-339516

ABSTRACT

The corona-like spikes or peplomers on the surface of the virion under electronic microscope are the most striking features of coronaviruses. The S (spike) protein is the largest structural protein, with 1,255 amino acids, in the viral genome. Its structure can be divided into three regions: a long N-terminal region in the exterior, a characteristic transmembrane (TM) region, and a short C-terminus in the interior of a virion. We detected fifteen substitutions of nucleotides by comparisons with the seventeen published SARS-CoV genome sequences, eight (53.3%) of which are non-synonymous mutations leading to amino acid alternations with predicted physiochemical changes. The possible antigenic determinants of the S protein are predicted, and the result is confirmed by ELISA (enzyme-linked immunosorbent assay) with synthesized peptides. Another profound finding is that three disulfide bonds are defined at the C-terminus with the N-terminus of the E (envelope) protein, based on the typical sequence and positions, thus establishing the structural connection with these two important structural proteins, if confirmed. Phylogenetic analysis reveals several conserved regions that might be potent drug targets.


Subject(s)
Amino Acid Sequence , Antigens, Viral , Allergy and Immunology , Base Composition , Computational Biology , Enzyme-Linked Immunosorbent Assay , Membrane Glycoproteins , Genetics , Molecular Sequence Data , Mutation , Genetics , Phylogeny , Protein Structure, Tertiary , Severe acute respiratory syndrome-related coronavirus , Genetics , Allergy and Immunology , Sequence Analysis, DNA , Sequence Homology , Spike Glycoprotein, Coronavirus , Viral Envelope Proteins , Genetics , Metabolism
4.
Genomics, Proteomics & Bioinformatics ; (4): 118-130, 2003.
Article in English | WPRIM | ID: wpr-339515

ABSTRACT

We studied structural and immunological properties of the SARS-CoV M (membrane) protein, based on comparative analyses of sequence features, phylogenetic investigation, and experimental results. The M protein is predicted to contain a triple-spanning transmembrane (TM) region, a single N-glycosylation site near its N-terminus that is in the exterior of the virion, and a long C-terminal region in the interior. The M protein harbors a higher substitution rate (0.6% correlated to its size) among viral open reading frames (ORFs) from published data. The four substitutions detected in the M protein, which cause non-synonymous changes, can be classified into three types. One of them results in changes of pI (isoelectric point) and charge, affecting antigenicity. The second changes hydrophobicity of the TM region, and the third one relates to hydrophilicity of the interior structure. Phylogenetic tree building based on the variations of the M protein appears to support the non-human origin of SARS-CoV. To investigate its immunogenicity, we synthesized eight oligopeptides covering 69.2% of the entire ORF and screened them by using ELISA (enzyme-linked immunosorbent assay) with sera from SARS patients. The results confirmed our predictions on antigenic sites.


Subject(s)
Amino Acid Sequence , Base Sequence , Cluster Analysis , Enzyme-Linked Immunosorbent Assay , Immunoassay , Molecular Sequence Data , Mutation , Genetics , Oligopeptides , Phylogeny , Protein Structure, Tertiary , Severe acute respiratory syndrome-related coronavirus , Genetics , Sequence Alignment , Sequence Analysis, DNA , Viral Matrix Proteins , Chemistry , Genetics , Allergy and Immunology
5.
Genomics, Proteomics & Bioinformatics ; (4): 155-165, 2003.
Article in English | WPRIM | ID: wpr-339512

ABSTRACT

The R (replicase) protein is the uniquely defined non-structural protein (NSP) responsible for RNA replication, mutation rate or fidelity, regulation of transcription in coronaviruses and many other ssRNA viruses. Based on our complete genome sequences of four isolates (BJ01-BJ04) of SARS-CoV from Beijing, China, we analyzed the structure and predicted functions of the R protein in comparison with 13 other isolates of SARS-CoV and 6 other coronaviruses. The entire ORF (open-reading frame) encodes for two major enzyme activities, RNA-dependent RNA polymerase (RdRp) and proteinase activities. The R polyprotein undergoes a complex proteolytic process to produce 15 function-related peptides. A hydrophobic domain (HOD) and a hydrophilic domain (HID) are newly identified within NSP1. The substitution rate of the R protein is close to the average of the SARS-CoV genome. The functional domains in all NSPs of the R protein give different phylogenetic results that suggest their different mutation rate under selective pressure. Eleven highly conserved regions in RdRp and twelve cleavage sites by 3CLP (chymotrypsin-like protein) have been identified as potential drug targets. Findings suggest that it is possible to obtain information about the phylogeny of SARS-CoV, as well as potential tools for drug design, genotyping and diagnostics of SARS.


Subject(s)
Amino Acid Sequence , Base Composition , Base Sequence , Cluster Analysis , Computational Biology , Conserved Sequence , Genetics , Evolution, Molecular , Gene Components , Genome, Viral , Molecular Sequence Data , Mutation , Genetics , Phylogeny , Protein Structure, Tertiary , RNA-Dependent RNA Polymerase , Genetics , Severe acute respiratory syndrome-related coronavirus , Genetics , Sequence Analysis, DNA
6.
Genomics, Proteomics & Bioinformatics ; (4): 180-192, 2003.
Article in English | WPRIM | ID: wpr-339508

ABSTRACT

Beijing has been one of the epicenters attacked most severely by the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) since the first patient was diagnosed in one of the city's hospitals. We now report complete genome sequences of the BJ Group, including four isolates (Isolates BJ01, BJ02, BJ03, and BJ04) of the SARS-CoV. It is remarkable that all members of the BJ Group share a common haplotype, consisting of seven loci that differentiate the group from other isolates published to date. Among 42 substitutions uniquely identified from the BJ group, 32 are non-synonymous changes at the amino acid level. Rooted phylogenetic trees, proposed on the basis of haplotypes and other sequence variations of SARS-CoV isolates from Canada, USA, Singapore, and China, gave rise to different paradigms but positioned the BJ Group, together with the newly discovered GD01 (GD-Ins29) in the same clade, followed by the H-U Group (from Hong Kong to USA) and the H-T Group (from Hong Kong to Toronto), leaving the SP Group (Singapore) more distant. This result appears to suggest a possible transmission path from Guangdong to Beijing/Hong Kong, then to other countries and regions.


Subject(s)
Humans , Genome, Viral , Haplotypes , Mutation , Open Reading Frames , Phylogeny , Severe acute respiratory syndrome-related coronavirus , Genetics
7.
Genomics, Proteomics & Bioinformatics ; (4): 216-225, 2003.
Article in English | WPRIM | ID: wpr-339504

ABSTRACT

Knowledge of the evolution of pathogens is of great medical and biological significance to the prevention, diagnosis, and therapy of infectious diseases. In order to understand the origin and evolution of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus), we collected complete genome sequences of all viruses available in GenBank, and made comparative analyses with the SARS-CoV. Genomic signature analysis demonstrates that the coronaviruses all take the TGTT as their richest tetranucleotide except the SARS-CoV. A detailed analysis of the forty-two complete SARS-CoV genome sequences revealed the existence of two distinct genotypes, and showed that these isolates could be classified into four groups. Our manual analysis of the BLASTN results demonstrates that the HE (hemagglutinin-esterase) gene exists in the SARS-CoV, and many mutations made it unfamiliar to us.


Subject(s)
Amino Acid Motifs , Amino Acid Substitution , Base Composition , Codon , Genetics , Computational Biology , DNA Mutational Analysis , Evolution, Molecular , Gene Transfer, Horizontal , Genetic Variation , Genome, Viral , Phylogeny , Severe acute respiratory syndrome-related coronavirus , Genetics
8.
Genomics, Proteomics & Bioinformatics ; (4): 226-235, 2003.
Article in English | WPRIM | ID: wpr-339503

ABSTRACT

Annotation of the genome sequence of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) is indispensable to understand its evolution and pathogenesis. We have performed a full annotation of the SARS-CoV genome sequences by using annotation programs publicly available or developed by ourselves. Totally, 21 open reading frames (ORFs) of genes or putative uncharacterized proteins (PUPs) were predicted. Seven PUPs had not been reported previously, and two of them were predicted to contain transmembrane regions. Eight ORFs partially overlapped with or embedded into those of known genes, revealing that the SARS-CoV genome is a small and compact one with overlapped coding regions. The most striking discovery is that an ORF locates on the minus strand. We have also annotated non-coding regions and identified the transcription regulating sequences (TRS) in the intergenic regions. The analysis of TRS supports the minus strand extending transcription mechanism of coronavirus. The SNP analysis of different isolates reveals that mutations of the sequences do not affect the prediction results of ORFs.


Subject(s)
Amino Acid Substitution , Base Composition , Base Sequence , Computational Biology , Methods , Genome, Viral , Isoelectric Point , Models, Genetic , Molecular Sequence Data , Molecular Weight , Open Reading Frames , Severe acute respiratory syndrome-related coronavirus , Genetics , Sequence Analysis , Transcription, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL